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Abstract 

An analysis of steganographic systems subject to the following perfect undetectability condition is presented 
in this paper. Following embedding of the message into the covertext, the resulting stegotext is required to have 
exactly the same probability distribution as the covertext. Then no statistical test can reliably detect the presence of 
the hidden message. We refer to such steganographic schemes as perfectly secure. A few such schemes have been 
proposed in recent literature, but they have vanishing rate. We prove that communication performance can potentially 
be vastly improved; specifically, our basic setup assumes independently and identically distributed (i.i.d.) covertext, 
and we construct perfectly secure steganographic codes from public watermarking codes using binning methods and 
randomized permutations of the code. The permutation is a secret key shared between encoder and decoder. We derive 
(positive) capacity and random-coding exponents for perfectly-secure steganographic systems. The error exponents 
provide estimates of the code length required to achieve a target low error probability. 

In some applications, steganographic communication may be disrupted by an active warden, modelled here by a 
compound discrete memoryless channel. The transmitter and warden are subject to distortion constraints. We address 
the potential loss in communication performance due to the perfect-security requirement. This loss is the same as 
the loss obtained under a weaker order-1 steganographic requirement that would just require matching of first-order 
marginals of the covertext and stegotext distributions. Furthermore, no loss occurs if the covertext distribution is 
uniform and the distortion metric is cyclically symmetric; steganographic capacity is then achieved by randomized 
linear codes. Our framework may also be useful for developing computationally secure steganographic systems that 
have near-optimal communication performance. 

Index Terms 

Steganography, watermarking, secret communication, timing channels, capacity, reliability function, error expo- 
nents, binning codes, randomized codes, universal codes. 
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I. Introduction 

Information embedding refers to the embedding of data within a cover object (also referred to as covertext) 
such as image, video, audio, graphics, text, or packet transmission times [1J-[5J. Applications include copyright 
protection, database annotation, transaction tracking, traitor tracing, timing channels, and multiuser communications. 
These applications often impose the requirement that embedding only slightly perturbs the covertext. The name 
watermarking has been widely used to describe information embedding techniques that are perceptually transparent, 
i.e., the marked object (after embedding) is perceptually similar to the cover object. 

In some applications, the presence of the embedded information should be kept secret (see applications below). 
Then perceptual transparency is not sufficient, because statistical analysis could reveal the presence of hidden 
information. The problem of embedding information that is hard to detect is called steganography, and the marked 
object is called stegotext [3], [4], [6]-[8]. Steganography differs from cryptography in that the presence of the 
message needs to remain secret, rather than the value of the message. The dual problem to steganography is 
steganalysis, that is, detection of hidden information within a stegotext. 

A classical model for steganography is Simmons' prisoner problem |9]. Alice and Bob are locked up in different 
cells but are allowed to communicate under the vigilant eye of Willie, the prison warden. If Willie detects the 
presence of hidden information in the transmitted data, he terminates their communication and subjects them to a 
punishment. Willie is a passive warden if he merely observes and analyzes the transmitted data. He is an active 
warden if he introduces noise to make Alice and Bob's task more difficult. 

In the information age, there are several application scenarios for steganography. 

1) Steganography may be used to communicate over public networks such as the Internet. One may embed bits 
into inconspicuous files that are routinely sent over such networks: images, video, audio files, etc. Users of 
such technology may include intelligence and military personnel, people that are subject to censorship, and 
more generally, people who have a need for privacy. 

2) Steganography may also be used to communicate over private networks. For instance, confidential documents 
within a commercial or governmental organization could be marked with identifiers that are hard to detect. 
The purpose is to trace unauthorized use of a document to a particular person who received a copy of this 
document. The recipient of the marked documents should not be aware of the presence of these identifiers. 

3) Timing channels can be used to leak out information about computers. A pirate could modify the timing of 
packets sent by the computer, encoding data that reside on that computer The pirate wishes to make this 
information leakage undetectable to avoid arousing suspicion. To disrupt potential information leakage, the 
network could jam packet timings — hence the network plays the role of an active warden. 

The channel over which the stegotext is transmitted could be noiseless or noisy, corresponding to the case of a 
passive and an active warden, respectively. Moreover, the steganographer's ability to choose the covertext is often 
limited if not altogether nonexistent. In the private-network application above, the covertext is generated by a 
content provider, not by the steganographer (i.e., the authority responsible for document security). Similarly in the 
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timing-channel application, the covertext is generated by the computer, not by the pirate. 
In view of these applications, the four basic attributes of a steganographic code are: 

1) detectability: quantifying Willie's ability to detect the presence of hidden information; 

2) transparency (fidelity): closeness of covertext and stegotext under an appropriate distortion (fidelity) metric; 

3) payload: the number of bits embedded in the covertext; and 

4) robustness: quantifying decoding reliability in presence of channel noise (i.e., when Willie is an active 
warden). 

If Alice had complete freedom for choosing the covertext, the transparency requirement would be immaterial. A 
covertext would not even be needed: it would suffice for Alice to generate objects that follow a prescribed covertext 
distribution. This model has two shortcomings: (a) as mentioned above, in some applications Alice has little or 
no control over the choice of the covertext; (b) even if she has, covertexts have complicated distributions, and 
generating a size-M steganographic code by sampling the covertext distribution would be highly impractical for 
large M. 

Information theory is a natural framework for studying steganography and steganalysis. Assuming a statistical 
model is available for covertexts, the only truly secure strategy from the steganographer's point of view is to 
ensure that the probability distributions of the covertext and stegotext are identical. This strong notion of security 
was proposed by Cachin [10] and is the steganographic counterpart of Shannon's notion of perfect security in 
cryptography. We refer to steganography that satisfies this strong property as perfectly secure. 

If AUce is allowed to select the covertext and Willie is passive, Alice may use the following perfectly secure 
steganographic code [10]. Ahce and Bob agree on a hash function, and the value of the hashed stegotext is the 
message to be transmitted. Alice searches a database of covertexts until she finds one that matches the desired 
hash value. This approach is perfectly secure irrespective of the distribution of the covertext. The disadvantages are 
that the search is computationally infeasible for large message sets (communication rate is extremely low), and the 
underlying communication model is limited, as discussed above. 

Cachin also proposed two less stringent requirements for steganographic codes [10]. One is e-secure stegano- 
graphic codes, where the KuUback-Leibler divergence between the covertext and stegotext probability distributions 
is smaller than e (perfect security requires e = 0). For random processes he redefined perfectiy secure steganography 
by requiring that the above KuUback-Leibler divergence, normalized by the length N of the covertext sequence, 
tends to zero as A'' — > oo. Unfortunately this does not preclude the possibility that KuUback-Leibler divergence 
remains bounded away from zero, even grows to infinity (at a rate slower than A'') as A?^ — > oo. If such is the case, 
WiUie's error probability tends to zero asymptotically, and therefore the perfect- security terminology is misleading. 

While Cachin focused on security and not on conmiunication performance in terms of payload, robustness 
and fidelity, KuUback-Leibler divergence has become a popular metric for assessing the security of practical 
steganographic schemes subject to transparency, payload, and robustness requirements [11]-[18]. Other metrics 
are studied in [19]-[25]. 

The tradeoffs between detectabiUty, fideUty, payload, and robustness can be studied in an information-theoretic 
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framework. The basic mathematical model for steganography is communications with side information at the 
encoder [26]. Moulin and O' Sullivan studied a general information-theoretic framework for information hiding 
and indicated its applicabiUty to steganography [27, Section VII. C]. However, they did not study perfectly secure 
steganography and did not derive expressions for steganographic capacity. Galand and Kabatiansky [28] constructed 
steganographic binary codes, but the code rate vanishes as . Fridrich et al. [29], [30] proposed positive-rate 
"wet paper" codes, which permit a change from the original cover distribution to a new stegodistribution. However 
they did not analyze the fundamental tradeoffs between payload, robustness, and detectability. 

The goal of this paper is therefore to study the information-theoretic Umits of perfectly undetectable steganography. 
As a first step towards this problem, we assume that covertext samples are independently and identically distributed 
(i.i.d.) over a finite alphabet. In practice the i.i.d. model could be applied to transform coefficients or to blocks 
of coefficients. While this is just a simplifying approximation to actual statistics, it allows us to derive tangible 
mathematical results and to understand the effects of the perfect security constraint on transparency, payload, and 
robustness. Our first result is a connection between public watermarking codes [27], [31], [32] and perfectly secure 
steganographic codes. Given any public watermarking code that preserves the first-order statistics of the covertext 
(this property will be referred to as order-1 security), we show that a perfectly secure steganographic code with the 
same error probability can be constructed using randomization over the set of all permutations of {1, 2, • • ■ , N}. We 
use this construction to derive capacity and random-coding exponent formulas for perfectly secure steganography. 

The codes that achieve capacity and random-coding exponents are stacked-binning schemes as proposed in [33] 
for general problems of channel coding with side information. The random-coding exponent yields an asymptotic 
upper bound on achievable error probability and therefore serves as an estimate of the code length required to 
achieve a target low error probability. A stacked-binning code consists of a stack of variable-size codeword arrays 
indexed by the type of the covertext sequence, and the corresponding decoder is a maximum penalized mutual 
information (MPMI) decoder The analysis is based on the method of types [34], [35]. 

Due to the added perfect-security constraint, capacity and random-coding exponent for steganography cannot 
exceed those of the corresponding public watermarking problem. Nevertheless, we have identified a class of problems 
where the covertext probability mass function (PMF) is uniform and the distortion function is symmetric, with the 
property that the perfect undetectability constraint does not cause any capacity loss. One special example in the 
general class is the case of Bernoulli(^) covertexts with the Hamming distortion metric [36]. For the binary- 
Hamming case, the perfect security condition has no effect on both the capacity and random-coding error exponent. 
Steganographic capacity is achieved by randomized nested linear codes. 

This paper is organized as follows. Section |ll] describes the notation, and Section |lll] the problem statement. 
Section HV] shows how perfectly secure steganographic codes can be constructed from codes with the much weaker 
order-1 security. Section [V] presents our main theorems on capacity and random-coding error exponent. Section IVTl 
discusses the role of secret keys in steganographic codes. Simplified results for the no-attack case are stated in 
Section lVTll A class of steganography problems for which perfect security comes at no cost is studied in Section lVIIII 
As an example of this class, the binary-Hamming problem is studied in Section IIXI The paper concludes with a 
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discussion in Section [X] 



II. Notation 



We use uppercase letters for random variables, lowercase letters for their individual values, and boldface letters 
for sequences. The PMF of a random variable X ^ X is denoted by px = {px{x), x G X}. The entropy of a 
random variable X is denoted by H{X), and the mutual information between two random variables X and Y is 
denoted by I{X; Y) = H{X) — H{X\Y). Should the dependency on the underlying PMFs be explicit, we use the 
PMFs as subscripts, e.g., Hp^{X) and Ip^^py^,^{X;Y). The Kullback-Leibler divergence between two PMFs p 
and q is denoted by D{p\\q); the conditional Kullback-Leibler divergence of py\x and qY\x given px is denoted 
by D{pY\x\\qY\x\Px) = D{PY\xPx\\qY\xPx)- 

Let Px denote the empirical PMF on X induced by a sequence x G X^ . Then is called the type of x. The 
type class Tx associated with px is the set of all sequences of type px- Likewise, we define the joint type pxy of 
a pair of sequences (x,y) e X^ x and the type class Txy associated with pxy The conditional type py|x of 
a pair of sequences (x.y) is defined as ^"^l^'f for all x £ X such that pAx) > 0. The conditional type class 
Ty|x given x is the set of all sequences y such that (x,y) e Txy. We denote by H{x} the empirical entropy for 
X, i.e., the entropy of the empirical PMF px- Similarly, we denote by /(x; y) the empirical mutual information for 
the joint PMF pxy The above notation for types is adopted from Csiszar and Korner [34]. 

We let U(il) denote the uniform PMF over a finite set il. We let Vx and Vx represent the set of all PMFs and 
all empirical PMFs, respectively, on the alphabet X. Likewise, Vy\x and Vy^x denote the set of all conditional 
PMFs and all empirical conditional PMFs on the alphabet 3^. We use E to denote mathematical expectation. 

The shorthands Qn ^ b^, qn < bx, and ax > bx are used to denote asymptotic equalities and inequaUties in 
the exponential scale for limjv^oo w^'^Sf^ = 0' l™sup^^^ -i^ log |^ < 0, and liminf^v^oo -i^ log |^ > 0, 
respectively. We define = max(t, 0), exp2(i) = 2*, and the binary entropy function 



We use In a; to denote the natural logarithm of x, and the logarithm logx is in base 2 if not specified otherwise. 
The notation is the indicator function of the event A: 



Finally, we adopt the notional convention that the minimum (resp. maximum) of a function over an empty set is 
+CXO (resp. 0). 



Referring to Fig. [T] the covertext is modelled as a sequence S = (5*1, • • • ,Sn) of i.i.d. samples drawn from 
a PMF {ps{s), s G S}. A message M is to be embedded in S and transmitted to a decoder; M is uniformly 
distributed over a message set M. The encoder produces a stegotext X through a function /jv(S, M), in an attempt 



h{t) 



A 



tlogi-(i-t)log(i-i), te[0,i]. 




1 A is true; 



else. 



III. Problem Statement 
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Fig. 1: Communication-theoretic view of perfectly secure steganography. 



to transmit the message M to the decoder rehably. The covertext and stegotext are required to be close according to 
some distortion metric. The distortion model is motivated by the fact that stegotext and covertext represent physical 
signals (such as images, text, etc.) which can be modified to some extent without affecting perceptual quahty [27]. 
The strength of the transparency constraint is controlled by a distortion parameter. 

A steganalyzer observes X and tests whether X is drawn i.i.d. from ps- If not, Willie, the steganalyzer terminates 
the transmission, and obviously the decoder is unable to retrieve M. If X is deemed innocuous, Willy may simply 
forward it to the decoder. In this case, Willie is a passive warden. To be on the safe side for preventing reliable 
transmission of hidden messages, Willie may want to pass X through some attack channel j5Y|x(y|x), thereby 
producing a corrupted text Y. In this case, Willie is an active warden, and the corrupted text and the stegotext 
are also required to be close according to some distortion metric. The alphabets iS, X and y for S, X and Y, 
respectively, are assumed to be identical. 

The decoder does not know Py|x selected by the steganalyzer and does not have access to the original covertext 
S. The decoder produces an estimate M = (p]\f(Y) G of the transmitted message. We assume that the 
encoder/decoder pair {,fN:(l>N) is randomized, i.e., the choice of {,fN,4>N) is a function of a random variable 
known only to the encoder and decoder but not to the steganalyzer. We can think of this random variable as a secret 
key as in [27], [31], [32]. Note that in generic information-hiding games, this secret key provides some protection 
against adversaries with arbitrary memory and unlimited computational resources |4, Section X]. In steganography, 
the secret key plays a fundamental role in ensuring perfect undetectability: the covertext and the stegotext have the 
same PMF when the secret key is carefully designed. The randomized code will be denoted by (.Fjv, $Ar) with a 
joint distribution p{fN, 4'n)- 

A. Stegano graphic Codes 

A distortion function is any nonnegative function d : S xS ^ M+ U {0}. This definition is extended to length-TV 
vectors using d^(s, x) = -i- ^(si, Xi). Let D 

max — HIQ-Xs.x 

d(s, x). We assume without loss of generality that 

d(s, x) > 0, with equality if s = x. 
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Definition 1 : A length- perfectly secure steganographic code with maximum distortion £)i is a triple ( , F/v , $ a? ) , 
where 

• is the message set of cardinality \A4\; 

• {Fn, ^n) has a joint distribution p{fN, (t>N)', 

• /at : y. M. ^ maps covertext s and message m to stegotext x = /Ar(s, m). The mapping is subject 
to the maximum distortion constraint 

d^(s, /jv(s, to)) < Di almost surely (1) 

and the perfect undetectability constraint 

px^ps; (2) 
» (j)N ■ Ai maps the received sequence y to a decoded message to = 0Ar(y). 

The above definition is similar to the definitions for a length- data-embedding or watermarking code in [27], 
[31], [32], with the additional steganographic constraint of (|2]l which requires perfect matching of A^-dimensional 
distributions. Also observe that the distortion constraint is inactive if Di > I?max, i e., the covertext S available 
to Alice plays no role. Given ps, define the set of conditional PMFs px\s such that the marginals of psPx\s 
equal {px = Ps) and the expected distortion between S and X does not exceed Di: 

Qf^^^iPs,Di) ^ lpx\s ■■ ^Px\siMs)Psis)d{s,x) < Di, Px{x) =^Px\siMs) Psis) = Psix), '^x e S \ . 

\ s,x s ) 

(3) 

Also recall that in Def. [T] randomization of (-FW, $Ar) is reahzed via a cryptographic key shared by encoder and 
decoder. 

Next, we define CCC and RM codes which will be used to construct perfectly secure steganographic codes. 

Definition 2: (CCC Code). A length- code with conditionally constant composition, order-1 steganographic 
property, and maximum distortion Di is a quadruple (Af, A, Fn, '^n), where A is a mapping from 7^^^' to T'^g- 
The transmitted sequence x = fx{s,m) has conditional type Px|s = ^ips)- Moreover, A(ps) G Qi^"^^ ips, F>i). 

Observe that such a code matches the first-order empirical marginal PMF of the covertext, but not necessarily 
higher-order empirical marginals. Hence such a code generally does not satisfy the perfect-undetectability property. 



Definition 3: (RM Code). A length- randomly modulated code is the randomized code defined via permuta- 
tions of a prototype (/at, ^at): 

x^fjf{s,m) = Tr^^fN{TTS,m) (4) 

(t>N(y) = <pN{7ry), (5) 

where tt is drawn uniformly from the set 11 of all A^! permutations and is not revealed to Wilhe. The sequence ttx 
is obtained by applying tt to the elements of x. 
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Definition 4: Given alphabets S and U, a steganographic channel pxu\s{^'''^\^) subject to distortion Di is a 
conditional PMF whose conditional marginal px|s belongs to Qf*'^^(ps, Di) of (|3]l. We denote by Q^'^'^^{L,ps, Di) 
the set of steganographic channels subject to distortion Di when the alphabet U has cardinaUty L. 

If the channel pxu\s satisfies the distortion constraint Di but not necessarily the steganographic constraint 
Px — PS, Pxu\S is simply a covert channel in the sense of [27], [31]. We shall denote by Q{L,ps, Di) the set of 
all such covert channels. Clearly, Q^^^^iLjPs, Di) C Q{L,ps,Di). 

B. Attack Channels 

A passive warden simply produces Y = X. An active warden passes X through a discrete memoryless channel 
(DMC), producing a degraded sequence Y. 

Definition 5: A discrete memoryless attack channel Py\x is feasible if the expected distortion between X and 
Y is at most D2: 

^px{x)pY\x{y\x)d{x,y) < D2. (6) 
Then the joint conditional PMF is given by 

N 

PY|x(y|x) = ]^py|x(y»|a;j). 

1=1 

We denote by 

A{px,D2) ^ Ipy\x ^'Pyix ■ '^Pxix)pY\xiy\x)d{x,y) < D2> 
K ) 

the set of all such feasible DMCs. This set is a compound DMC family. 

As an alternative to Def. |5] one may consider attack channels that have arbitrary memory but are subject to an 
almost sure distortion constraint [31]-[33]. In this case, the set of feasible attack channels is given by 

^'(Px,i?2) - {pyix g ■ [d'^(y'^) < ^2] - 1} . 

There are three reasons why only memoryless channels are considered in this paper. First, it is shown in [33] 
that for watermarking problems, both DMCs with expected distortion and arbitrary memory attack channels with 
almost sure distortion result in the same capacity formula, and the former allows a smaller random-coding error 
exponent when D2 is the same. Thus, in terms of minimizing the random-coding exponent, selecting Py\x from 
the compound DMC class A{pxtD2) is a better strategy for the warden than selecting Py|x from ^'(px,£'2)- 
Second, the assumption of memorylessness simplifies the presentation of main ideas. Finally, note that the proofs 
for the compound DMC provide the basis for the proofs in the case of channels with arbitrary memory [32], [33]. 

C. Steganographic Capacity and Reliability Function 

The probabiUty of error for a randomized code {Fx, ^n) under a channel py\x is given by 

P,^n[Fn,^n,Py\x) = Pr{M ^ M), (7) 
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where the average is over all possible covertexts S, messages AI, and codes (F/v, $Ar). 

Definition 6: A rate R is achievable if there exists a randomized code {Fn, ^n) such that |A^| > and 



sup Pe,N{FN,^N,PY\x) ^ as N ^ oo. 

Py\x 



(8) 



Definition 7: The steganographic capacity C^**^^ {Di, D2) is the supremum of all achievable rates. 
Definition 8: The steganographic reliability function E^*"^^ (R) is defined as 



ESteaijl^. = liminf 



-TT log ^inf sup Pe,N{FN, ^n,Py\x) 



(9) 



IV. From Order- 1 to Perfectly Secure Steganographic Codes 

Codes with conditionally constant composition (Def. |2]) and randomly modulated codes (Def. O play a central 
role in our code constructions and coding theorems. The following proposition suggests a general construction 
for perfectly secure steganographic codes: first select some deterministic prototype /jv with the CCC and order-1 
steganographic properties and maximum distortion Di (Def. |2|l, second construct a RM code from that prototype. 
In Section |V] we show that this strategy is an optimal one. The proof of the proposition appears in the appendix. 

Proposition 1: Let {Ai, Fn, ^n) be a RM code whose prototype {/n, <t>N) has conditionally constant composi- 
tion, order-1 security, and maximum distortion Di. Then (A^,F;v,$Ar) is a perfectly secure steganographic code 
with maximum distortion Di and same error probability as the prototype (/at, (f)M)- 

V. Steganographic Capacity and Random Coding Error Exponent 

The steganographic codes in our achievability proofs are randomly-modulated binning codes with conditionally 
constant composition. The existence of a good deterministic prototype is established using a random coding 
argument. An arbitrarily large integer L is selected, defining an alphabet U = {1,2,- ■■ ,L} for the auxiliary 
random variable U in the binning construction. Given the covertext s and the message to, the encoder selects an 
appropriate sequence u in the binning code and then generates the stegotext randomly according to the uniform 
distribution over an optimized type class T^iu.s- Proofs of the theorem and propositions in this section appear in 
Appendices Hill VI I 

The following difference between two mutual informations: 

JLips,Pxu\s,PY\xus) ^ IiU;Y) - I{U;S) (10) 

plays a fundamental role in the analysis. 

Theorem 1: Under Def. [T]for steganographic codes and Def. |5] for the compound attack channel, steganographic 
capacity is given by 

C^*^^'(Di,D2) = lim Cl'^o{Di,D2), (11) 
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where 

Cl^''^{Di,D2) = max min Jl{ps,Pxu\S,Py\x) (12) 

Pxu\seQ^*''<'{L,ps,Di)PY\x^MPx,D2) 

and ([/, S) ^ X forms a Markov chain. 

The proof of Theorem [T] is given in two parts. The converse part is proved in Appendix HIl The direct part is a 
corollary of a stronger result stated in Proposition |2] below, which provides a lower bound on the achievable error 
exponent (hence an upper bound on the average probability of error) and is proved in Appendix |lll] 

Proposition 2: Under Def. [T]for steganographic codes and Def. |5]for the compound attack channel, the following 
random-coding error exponent is achievable: 

(R) = Yim E^^^l^ (i?) , (13) 



where 



E^*^^ (R) = min max min min (14) 

' Ps&Vs pxi7|sGS^' = '(L,ps,-Di) Py\xus<^'Py\xus Pyix&^{px ,D2) 

Dips Pxu\s Py\xus\\ps Pxu\s Py\x) + \ Jl{ps,Pxu\s,Py\xus) ~ R\ 

Moreover, E^^^b^R) = if and only if i? > C^^''^ _ 

Remark 1: The capacity and error exponent formulas in (fTTT i — (fl4b coincide with those for public watermark- 
ing [32], [33], the only difference being that here the maximization over pxu\s is subject to a steganographic 
constraint. Clearly E^*^\R) < EPf^^^R) and C^^^a < C^^bWM^ 

Remark 2: Proposition |2] is proved using a random binning technique. First we estabhsh the existence of a 
deterministic prototype CCC code with order- 1 steganographic property, maximum distortion Di, and error exponent 
E^^^siR). The decoder is an MPMI decoder. The main steps in this part of the proof are similar to those in the proof 
of Theorem 3.2 in [33], with the additional order-1 steganographic constraint on the encoder. The second part of the 
proof is an application of Proposition [T] random modulation of the CCC prototype code yields a perfectly-secure 
steganographic code with maximum distortion Di and error exponent E^*'^s{R). 

Remark 3. As mentioned earlier, the covertext plays no role in the special case Di > Umax, and so Alice can 
generate X independently of S. The capacity formula (fTTI) becomes simply 



C^**=9= min I{S;Y), 

Py\s<^-A(ps,D2) 



and the random-coding exponent is 



£;f^s(i?) =min^ min min [D(pyis Ps\\py\s Ps) + \IpsPYisiS;Y) - R\+] . 

PS Pyis^Tyis PYiseA{ps,D2) 

The binning codes are degenerate in this case; the expressions for capacity and random-coding exponents reduce to 
classical formulas for compound DMCs without side information [34] and are achieved using constant-composition 
codes. Further specializing this result to the case of a passive warden {D2 — 0, hence Py\x — l{y=x})' we obtain 
(jsteg ^ jji^g-j ^jjj E^^^'si^R) is given by (EB, see Section [Vll] 
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Fig. 2; A binning scheme with a stack of variable-size codeword arrays indexed by the covertext sequence type. 



The operation of the deterministic prototype code is illustrated in Fig. |2] The codebook C consists of a stack of 
codeword arrays indexed by the possible covertext sequence types. Given an input s, the encoder evaluates its type 
Ps and selects the corresponding codeword array 

C{ps) = {u(Z,m,Ps), 1 < ^ < 2'^P^P'\ 1 < m < |A^|}, (15) 

in which the codewords are drawn from an optimized type class Tu = Tj}(ps)- Each array C{ps) has \A4\ columns 
and 2^''(P=) rows, where p{ps) is a function of the corresponding covertext type ps and is termed the depth parameter 
of the array. Given y, the decoder seeks a codeword in C = IJ^ C{ps) that maximizes the penalized empirical 
mutual information and outputs its column index as the estimated message: 

m = argmaxmax[/(u(;,m,ps);y) - p{ps)] ■ (16) 

By letting p{ps) ~ I{u;s) + e, where T^s — Tj*fg{ps) is an optimized joint type and e is an arbitrarily small 
positive number, an optimal balance between the probability of encoding error and the probability of decoding 
error is achieved. The former vanishes double-exponentially while the latter vanishes at a rate given by the random 
coding error exponent in ( fT4l i. The above MPMI decoder can be thought of as an empirical generalized maximum 
a posterior (MAP) decoder [33, Section 3.1]. 

VI. Secret Key 

In standard information-hiding problems with a compound DMC attack channel, deterministic codes are enough 
to achieve capacity; random coding is used as a method of proof to establish the existence of a deterministic code 
without actually specifying the code [37]. In our steganography problem, a randomized code is used to satisfy the 
perfect-undetectability condition of (|2]l. Without the secret key, a deterministic code generally could not satisfy the 
perfect-undetectability condition. Also note that a randomized code is generally needed if the attacks have arbitrary 
memory [31]-[33]. For example, in watermarking games, knowing a deterministic code the adversary would decode 
and remove the message; deterministic codes are vulnerable to this kind of "surgical attack" [4]. 
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For randomized codes, the secret key shared between encoder and decoder is the source of common randomness. 
For RM codes, the secret key specifies the value of the permutation tt. The entropy rate of the secret key is 

ilf*^-^log2iV!<log2iV. (17) 
VII. Passive Warden 

A passive warden introduces no degradation to the stegotext; in this case, £'2 = and Y — X, i.e., 

Py|x = l{y=x}- (18) 

This results in simplified expressions for the perfectly secure steganographic capacity in ( fTTT) and the random-coding 
error exponent in ([13), see Propositions [3] and |4] below. The proofs of these propositions appear in Appendices |IV] 
and[V] respectively. 

Proposition 3: For the passive-warden case (D2 — 0), the maximization in ST% is achieved hy U = X and 

C'^"'3{Di,0)^ max H{X\S). (19) 

PxiseSf"=''(ps,-Di) 

Remark. Since H{X\S) = H{X) - I{S;X) ^ H{S) - 1(3] X), we have 

C^*"^'(Di,0) = - min I{S;X). 

For the problem of encoding a source S subject to distortion Di, the minimum rate for representing the source is 
given by the rate-distortion function 

RsiDi)^ min I{S;X)< min IiS;X) 

where the inequality holds because px\s ^ Qi*'^^ {Ps, Di) implies Ed(5, X) < Di. Hence 

C^*«9pi,0) < H{S)-Rs{Di) (20) 

and the capacity- achieving codes for the passive-warden case are analogous to rate-distortion codes. Equality holds 
in (l20b if the distribution that achieves the rate-distortion bound satisfies the steganographic property px = ps- 
Proposition 4: For the passive-warden case {D2 = 0), the random-coding exponent is given by 



£;f (i?) = min max D{ps\\ps) + \Hp,^p^^^{X\S) - R\ 



(21) 



VIII. Penalty for Perfect Security 

The capacity expressions for public watermarking in [27], [32] and for steganography in ( fTTT i take the same form, 
except that here the maximization of Pxu\s is subject to the steganographic constraint. Consequently, we have 

(jSteg ^ ^PubWM (22) 
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and similarly 

"^f (R) < "''W'M _ (23) 

For some special cases, it is possible that the optimal covert channel for public watermarking automatically satisfies 
the perfect security condition, and (|22] | and ( |23] | hold with equality. Proposition |5] below states sufficient conditions 
on the covertext PMF ps and the distortion function d(-, •) that ensure the perfect security constraint causes no 
penalty in communication performance. 

We consider S = Zq = {0, 1, 2, ■ • ■ , g — 1}, which is a group under addition modulo q. We shall use the 
notation k^k mod q. The covertext S is uniformly distributed over Zg, i.e., 

PS = 

The associated distortion function d : S x S ^ U {0} satisfies 

d{i, i) = and d(i, j) = d(0, j - i ), 

If we write {d(i, j)}^ ^Lg ^ matrix form, the distortion matrix is cyclic-Toeplitz. 

Definition 9: Let V = {0, 1, • • • , L — \},ps = U(iS), andU = {0, 1, 2, • • • , qL — 1}. Given any covert channel 
Pxv\S G Q{L,Ps, Di), where w e V, we define an associated covert channel Pxu\s G 'Pxu\S^ where U ^U,hy 

Pxuis (a;, qv + i\s) = -pxvis ( x - i , v\ s - i ) , Vu £ V, Vi, s, x e 5. (24) 
\ I ' q I \ I ' 

For any stochastic matrix pxv\s G Q{LtPs, Di), by (|24] |. the new channel pxu\s contains all of its q cyclically 
shifted versions (with respect to X and S) and these shifted versions are equally Ukely. Since the distortion function 
is cyclic, it is easy to verify that 

Moreover, the marginal PMF px induced by ps = U(5) and pxu\s is given by 

9-1 



1 1 

Pxix) = - pxj x - i ) = - = ps{x), Vx e 5, (25) 
where px is the marginal PMF induced by ps = U(iS) and pxv\s G Q{L,ps, Di). That is, 

Pxu\s^Q^'''iqL,ps,D,). 



Definition 10: The class Q^y^^{qL,ps,Di) is the set of all such pxu\s defined in 
Clearly, we have 

Qlll^{qL,ps,D^) c Q^'^^{qL,ps,Di) c Q{qL,ps,D^). (26) 
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Definition 11: The class of cyclic attack channels subject to distortion D2 is defined as 

AcyciD2) = £ 'Pyix ■ PY\xiy\x) = PY\xi y - x \0), Wx,yeS, 

1 T 
and -J2PY\x{y\0)d{y,0)<D2\. (27) 



y=o 

Any stochastic matrix Py\x ^ •A.cyc{D2) is cyclic-Toeplitz. Also note that for any px £ Vx, 

Acyc{D2)cA{px,D2). (28) 

Proposition 5: For the above q-ary information-hiding problem, the capacities for both the perfectly secure 
steganography game and the public watermarking game are the same. That is, the perfect security constraint in (|2|i 
does not cause any capacity loss. Moreover, there is no loss of optimality in restricting the maximization in (fT2] | 
to Qfy'^3{qL,ps,Di) and the minimization to AcyciD2)- 

= lim max min Jl{ps,Pxu\S,Py\x)- (29) 

The proof is given in Appendix |VI] 

IX. Example: Binary-Hamming Case 

We illustrate the above results through the following example, where S — {0, 1}, and the covertextis Bernoulli(^) 
sequence, i.e., 

Pr[S = 1] = Pr[S = 0] = i. 
The Hamming distortion metric is used: d{x,y) — l^^^yj. 

A. Capacity 

The capacity in the public watermarking game setting is given in [33] as follows 

jh.[hidD,) - hiD2)], ifO<Di<dD,; 

h{Di)-h(D2), xfdD,<Di<\/2; (30) 

l-h{D2), ifDi>l/2, 

where do^^l- 2-^'^°^\ When D2 = 0, 

f hiDi) if < Di < 1/2; 
C = { ~ ~ (31) 

[1 if I^i > 1/2. 

Fig. [3] shows the above two capacity functions. 
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Fig. 3: Capacity for a perfectly secure steganography game when the covertext S is a Bernoulli(i) sequence. 

The optimal attack channel is a binary symmetric channel (BSC) with crossover probability D2- If (1^2 < 
Di < 1/2, the optimal covert channel is also a binary symmetric channel: BSC(-Di) (i.e., \U\ = 2, U = X, and 
Pxu\s — Px\s)' otherwise, the capacity is achieved by time sharing: no embedding on a fraction of 1 — 
samples and embedding with the optimal covert channel BSC((iD2) on the rest of samples. Since the covertext S 
is a Bernoulli(-i) sequence, the output of the above optimal BSC(p) covert channel is also Bernoulli(i). That is, 
the optimal covert channel for the public watermarking game satisfies px — ps, and the perfect security constraint 
does not cause any loss in capacity, as stated by Proposition |5] 

B. Random-Coding Exponent 

In [33], we numerically computed the random-coding exponent for public watermarking in the case of Di ~ 0.4, 
D2 = 0.2, and \U\ = 2 as shown in Fig. |4l We found that the optimal covert channel is still a BSC(Z3i) {pxu\s = 
Px\s) with the time sharing strategy. It implies that at least for the case of \U\ = 2, px — Ps and the perfect 
security constraint causes no loss in random-coding exponent either. 

C. Randomized Nested Linear Codes — A Capacity-Achieving Code Construction 

For information-embedding problems with a fixed attack channel BSC(-D2), deterministic nested binary linear 
codes were proposed to achieve capacity, where Ci, a good source code with Hamming distance Di, is nested in C2, 
a good channel code over BSC(i:'2) [38], [39]. When IC2I = 2^[i~''(^2)l and \Ci \ = 2^^^-^^°^'^\ the asymptotic 



code rate 




h{Di) - h[D2) 



February 16, 2007. Revised September 30, 2007 



DRAFT 



16 




0, 




R 

Fig. 4: Random-coding exponent for perfectly secure steganography game when the covertext S is a Bernouni(i) 
sequence, Di = 0.4, = 0.2, and \U\ = 2. 



is equal to the capacity in the regime Di > dD2 - In the regime Di < rf^i^, the time-sharing strategy of dSOl l is 
applied. These nested linear codes apply to both public watermarking and steganography because BSC(£'2) is the 
optimal discrete memoryless attack channel. The stegotext codewords are elements of C2 [38], [39]. 

In the passive-warden case (D2 — 0), we simply let C2 = F^, and perfect security is achieved even without 
a secret key. In the active- warden case, C2 is a subgroup of F^, and randomization via the secret key plays an 
essential role in achieving perfect security. The strategy described below makes the transmitted stegotext uniformly 
distributed over F^. The resulting code is a randomized nested binary linear code. 

Partition the whole space F^ into a disjoint union of C2 and its cosets: 



U C2 © c, (32) 



where C2 © c is a coset of C2, the element c 6 i72 is a coset leader, and the set il2 contains all coset leaders. We 
have 

nN 

|fi2| = ^-2^''(^=). (33) 

1^2 I 

Let the secret key K be uniformly distributed over il2. For any k G ^2, the encoder output is defined as 

x = /]^(m,s) = /^(TO,s©k)©k, (34) 

where f^{-,-) is the deterministic encoder used for the information-embedding or watermarking problem. The 
decoding function is 

m = cl)%{y)^4'%{y(Bk), (35) 
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where (/>^(-) is the decoder associated with f^{-). 

Since the output of the deterministic encoder is uniformly distributed over C2 and the secret key K is uniformly 
distributed over il2, the coset decomposition property {3% ensures that the randomized encoder output of ( l34l i is 
uniformly distributed over F^. Hence perfect security is achieved. By (|33] | the entropy rate of the secret key is 
h{D2), unlike the log2 N growth required for general RM codes in (fTTI l. 

X. Conclusion 

A strict definition of perfect security has been adopted in this paper, implying that even a warden with unlimited 
computational resources is unable to reliably detect the presence of a hidden message. We have studied the Shannon- 
theoretic limits of communication performance under this perfect-security requirement and studied the structure of 
codes that asymptotically achieve those limits. The main results are summarized below. 

• Perfectly secure steganography is closely related to the public watermarking problem of [27], [33]. Positive 
capacity and random-coding exponents are achieved using stacked-binning codes and an MPMI decoder 

• Randomized codes are generally needed to achieve perfect security. The common randomness is provided 
by a secret key shared between the encoder and decoder. For i.i.d. covertexts. Proposition [T| shows that 
perfectly secure steganographic codes can be constructed using randomized permutations of a prototype CCC 
watermarking code that merely has an order- 1 security property, i.e., the prototype code matches the first-order 
marginals of the covertext and stegotext, but not the full A^-dimensional statistics. 

• The cost of perfect security in terms of communication performance is the same as the cost of order- 1 security. 
However, if the covertext distribution is uniform and the distortion metric is cyclically symmetric, the security 
constraint does not cause any loss of performance. 

Computational Security. This paper has focused on the interplay between communication performance and 
information-theoretic security, where security is achieved using a private key that is uniformly distributed over a 
group A more practical setup would involve a public -key system, in which a reduced set of representers of 

Q'^'^^ is selected, each corresponding to a value of the key. Assume that the uniform distribution over this reduced 
set is computationally indistinguishable (in a sense to be precisely defined) from the uniform distribution over 
gsub^ The resulting steganographic code is no longer perfectly secure but inherits the computational security of 
the key generation mechanism. Thus the framework analyzed in this paper can form the basis for constructing 
computationally secure steganographic codes that have near-optimal communication performance. 

Extensions. Our basic framework can also be used to analyze complex problems involving covertexts with Markov 
dependencies and covertexts defined over continuous alphabets [40, Sec. X]. While such extensions are technically 
challenging, we hope that the mathematical structure of optimal codes identified in this paper under simplifying 
assumptions will shed some light on the development of practical codes with high communication performance. 
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Appendix I 
Proof of Prop[T] 

First we verify the perfect security condition. For RM codes (Def. O, we have 

PX|7r,S,M(x|7r, SjTO) = l{7rx=/„(Trs,m)}- 

Also note that for any x, z G Tg, there exists a permutation ttq such that x = ttqz. Hence the value of the sum 
X^TT l{7rx=z} is independent of z (conditioned on z £ Tg), and so 

1 1 TV! 

2^i{.x=z}-^2_.l.i{-=z}-^2^i-^. (36) 



Hence for any type class Tg we have 



meM ' ' s'eTs 



E I A//I E ITJ 51 ^{'' 



^ TTiEtaTi E TTn" E l{'rx=/jv(s",m)} 
^ iTTj E IJM E ]^ E-'^{'^''=/"(«"^™)} 

(A) ^ V — V — 1 

lA^I ^ |Ts| ^ ITJ ^^^^^^ 

= ^l{xeTa}, (37) 

where in (a) we have made the change of variables s" = its', and in (b) we have used ( l36l ) with z = /Ar(s",m). 
From ( l37b we obtain 

PX(X) = E^'s(?^s)px|T3(x|T,) = ^ps(Ts) ^ll^eTa} =Ps(x), Vx G 5^, 

hence the perfect security condition (O is satisfied. 

Now verifying the maximum-distortion constraint ([T]i, for every tt we have 

d^(s,/-(s,m)) d^(s,7r-VAr(7rs,m)) {nsj^ins^m)) < 

where (a) uses the definition of in (|4]l, (b) holds because the distortion measure is additive, and (c) holds because 
of our initial assumption on the prototype f^. Therefore ([U holds. 

Finally, let us evaluate the error probability for the RM code. Since the covertext source and the attack channel 
are memoryless, we have 

Psi^)^Psi^^) and p^|^(y|x) =p^|^(^y|^x) (38) 
for any permutation tt. The error probability for the prototype code takes the form 

PeM{fN,(l)N,PY\x) = E E Psi^) E l{x=/iv(s,m)} E Py\X ivl^) {<Pn iy)^m} ■ 

' ' metises" xe5" yes" 
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For the prototype code modulated with permutation tt, we have 

Pe,N{fN,(t>N,PY\x) = j"^ PS i^) l{^x=/„(,rs,m)} Py\X ivl^) ^ Un (7ry)^m} 

' 'meMsGS" xG5" y&S" 

' 'meMseS" xG5" ye5" 

= TJm Y Psi^') Y l{x'=/iv(s',™)} Y P?|x(yV)l{0„(y')#™} 

= Pe,N{fN,<l>N,PY\x), (39) 

where (a) holds because of ( [38] l. (b) is obtained using the change in variables s' = tts, x' = ttx, y' = Try, and (c) 
holds because the three sums run over all elements (s',x',y') of x x , and so the order of summation 
is inconsequential. Since ( [39] l holds for every permutation tt, the error probability for the RM code is equal to 

Pe,N{FN,'^N,PY\x) = ■^Y^<''^^-fN''l>N^PY\x) = Pe,N (Jn , (l)N , PY\x) ■ 

TT 

This completes the proof. 

Appendix II 
Converse Proof of Theorem[T] 

The converse is an extension of the proof in [33, Section 7]. Our upper bound on achievable rates is derived by 

• replacing the perfect-security constraint with a weaker order- 1 security constraint on the encoder: 

Px=Ps Vm,S,X = /Ar(s,77l) (40) 

(matching the types of input s and output x = fxis, m) of the encoder /jv), 

• replacing the almost-sure distortion constraint with an expected distortion constraint on the encoder: 

^ Y P^(s)d^(s,/^(s,m)), (41) 

> and providing the decoder with knowledge of the attack channel py\x- 
Clearly any upper bound we derive under these assumptions is an upper bound on capacity as well. 
For any rate-i? code {Jn, <t>N) and DMC Py\x G -^{px, D2), we have 

NR = H{M) = H{M\Y) + I{M; Y) 

< l + PeifN,(l}N,PY\x)NR + IiM;Y), 

where the inequality is due to Fano's inequality. In order for not to be bounded away from 0, rate B needs to 
satisfy 

iVi?-l< min /(M;Y). (42) 

PY\xe^iPX ,D2) 
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The joint PMF of (M, S, X, Y) is given by 

PAfSXY|/„ = PMPs PY\X^{^=fN{S,M)}- (43) 

Owing to ( |43] l. for any I < i < N, (M, S, {Yj}j^i) Xi ^ Yi forms a Markov chain and so does 

iW,,S,)^X,^Y,, (44) 

where the random variable Wi is defined as 

W, = (M, S,+u--- ,Sn,Yi,-- - , y,_i). (45) 
Using the same set of inequalities as in [26, Lemma 4], we obtain 

N 

I{M;Y) < [HW^■.Y,) ~ I{W,;S,)]. (46) 

1=1 

We define a time sharing random variable T, which is uniformly distributed over {1, ■ • ■ , N} and independent 
of all other random variables, and define the quadruple of random variables {W, S, X,Y) as {Wt, St, Xt,Yt)- 
With this definition, the order- 1 security constraint ( |40] l becomes px ~ Ps, and the expected distortion constraint 
gB becomes J2s,xPs{s)Px\s{x\s) di{s, x) < Di. Therefore px\s e Qf^"^ {ps ■ Di). 

By ( |45T l, the random variable W is defined over an alphabet of cardinality expj {N [R + log \ S\]}. Moreover 
{W, S) ^ X ^ Y forms a Markov chain. Combining ( l42b and ( |46] l, we further derive 

R < — min I{M;Y) 

N PYlxe-A.{px,D2) 

2 = 1 

min [/(VK;r|r) - /(M^;S'|T)] 

PY-|xe-4(px,-D2) 

min [I{W,T;Y)-I{W,T;S)~I{T;Y)+I{T;S)] 

PY\xeA(px,D2) 

< min [I{U; Y) - I{U; S)] , (47) 

P'K|x6-4(px,-D2) 

where [/ = {W, T) is defined over an alphabet of cardinality 

L{N) = Nexp^{N[R + log\S\]}, (48) 

and the last inequality is due to I(T;Y) > and I{T;S) — (since T is independent of S). Since px\s £ 
Qf'''{ps,Di), we have pxc/|s e Q'''^%L{N),ps, Di). 
Recall that Jl(ps, Px(7|s,Py|x) = I{U;Y) - I{U; S) when \U\ = L, and that 

Cf**"^ = max min -^L(ps,te(7|s,PY|x)- 

Pxu\s<^Q'^*''HL.,ps,Di)pY^xeA{px:D2) 

Following the same arguments as in [33], the sequence is nondecreasing and converges to a finite limit 

^ \im C^*"'^ = lim max min Jl{ps,Pxu\s,Py\x)- 

L^oo -f'^oopx!7|s6Q^*=9(L,ps,-Di) Py\x &A{px ,02) 
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Therefore, continuing with ( l47T i. R is bounded by 

R < min [/([/; y) -/([/; 5)] 

Pi'|x6-4(px,-D2) 

= min Jl(N){px,Pxu\S,Py\x) 

< sup max niin Jl{ps,Pxu\s,Py\x) 

= lim max min Jl(ps,Pxu\s,Py\x) 

L^oopxt7|seQ®' = »(i:Ps,-Di)pi'|xe^(px,-D2) 

This proves the converse part of Theorem [T| 

Appendix III 
Proof of Proposition[2] 

We have 

E^^^^{R)<E^X'''^^'{R). 

Recall from [33, Lemma 3.1] that the sequence E^'f'^^^{R) is nondecreasing and converges to a finite limit 
£;Pixf)VFM j- jj-j as L ^ oo. Using the same arguments as in [33, Lemma 3.1], it follows that the sequence E^^^^{R) 
is nondecreasing and converges to a finite limit E^^'^^{R) as L ^ cx). Hence for any e > and R, there exists 
L(e) such that 

E^^^l^{R) > £;f (i?) - e, VL > L{e). 

We next prove that for any L, a sequence of deterministic codes {Jn, (t>N) with order- 1 steganographic security 
exist with the property that 



lim 



^log max Pe{fN,(l}N,PY\x) 
W PY\xeA{px,D2) 



To prove the existence of such a code, we construct a random ensemble of binning codes (/jv, (Pn) with auxiliary 
alphabet U — {1,2, ■ ■ ■ , L} and show that the error probability averaged over <^ vanishes at rate E^*^^{R) as 
goes to infinity. The proof is based on that of [33, Theorem 3.2] with special treatment on the encoder construction 
for perfect security. 

Assume that R < cf^'^^ — e. For any covertext type ps and conditional type Pxu|s> define the function 

El,n{R,Ps,Pxu\s) — min min D{ps 

Pxu|s Py |xus \\PSP^u\sPY\x) 

PylxusPY\xeA(px,D2) I 

+ |/(u;y)-/(u;s)-e-i?|+]. (50) 

Define Q^*'^^ {N, L,ps, Di) as the set of conditional types Px|us that also belong to the set Q^*'^^ {L,ps, Di) of 
feasible steganographic channels. If Px|us ^ Q^^'^^ [N , L ,ps, Di) then 

(1) Px = Ps, i-C-, the stegotext sequence has the same type as the covertext sequence and the order-1 security 
condition is satisfied; 
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(2) d^(x, s) < Di, i.e., distortion is no greater than Di for any choice of s and m. 
The set Q^*'^^ {N, L,ps, Di) includes Px|us — l{x=s} and is therefore nonempty. 

Now denote by Px|us the maximizer of ( fSOl l over the set Q^*'^s{N,L,ps,Di). As a resuh of this optimization, 
we may associate 

• to any covertext type ps, a type class Tlf{ps) = Tu and a mutual information Il/gips) — (u; s); 

• to any covertext sequence s, a conditional type class Tj*j^g{s) = Tu|s; 

• to any sequences s and u G Tj}^(ps), a conditional type class TJ|yg(u,s) = T^ius- 

A random codebook C is the union of codeword arrays C{ps) indexed by the covertext sequence type ps- Let 
p{ps) — ^usiPs) + codeword array C{ps) is obtained by drawing 2^^-'^+'''^'''^^ random vectors independently 

and uniformly from the corresponding type class T^{ps), and arranging them in an array with 2^P'-P-^ rows and 
2NR columns indexed by messages. 

A. Encoder /at 

Given a codebook C, a covertext sequence s, and a message m, the encoder finds in C{ps) an I such that 
u{l,m) G Tj^|g(s). If more than one such / exists, pick one of them randomly (with uniform distribution). Let 
u = u(/, m). If no such I is available, the encoder declares an error and draws u from the uniform distribution over 
the conditional type class T^^g{s). Then x is drawn from the uniform distribution over the conditional type class 
rj|j^^(u, s). Recalling the discussion below (ISOl l. Jn satisfies both the order- 1 steganographic security constraint 
and the maximum distortion constraint. 

B. Decoder 0jv 

Given y and the same codebook C used by the encoder, the decoder first seeks a covertext type ps and u G C{ps) 
that maximizes the penalized mutual information criterion 

max max [I{u;y) - p{ps)]. (51) 
Ps uec(pB) 

The decoder then outputs the column index m that corresponds to u. If there exist maximizers with more than one 
column index, the decoder declares an error 

C. Error Probability Analysis 

The probability of error is given by 



Following the steps in [33, Section 5], the encoding error vanishes double-exponentially and only the decoding 
error contributes to Pe,Af on the exponential scale: 



Pe,N 



A 



max 



Pr{M M) 



max 

Pi'|xe-4(px,-D2) 



Pe{fN,(t)N,PY\x)- 




(52) 
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As ^ oo, by [33, Lemma 2.2], the above error exponent converges to 

E^*^^{R) = min max min min 

pseVs pxu\s<^Q^*°^{L,ps,Di) Py\xus&'Py\xus Py\x<^^{px,D2) 

D{psPxu\sPy\xus\\psPxu\sPy\x) + \Jl{ps,Pxu\s,Py\xus) ■ (53) 

Clearly, E^*'^^{R) > 0, with equality if and only if the following conditions are met: 

• the minimizing PMF ps is equal to ps', 

• the minimizing conditional PMF Py\xus is equal to py\x', and 

• R > ^^^PxuiseQ'^"'HL,PsMi) ™npy,|xe-4(px,-D2) Jl{ps,Pxu\s,Py\x) = Cf**"^. 

Therefore, E^^^^R) > and the error probability vanishes for any R < Cf^'^^ {Di, D2)- This implies that the 
capacity is lower-bounded by 

lim Cl''%D,,D2). 

D. Perfect Security 

Having established the achievability of E^*^^{R) and (7^*'^£' for a deterministic code {fN,(t>N) with order- 1 
security and maximum distortion Di, we invoke Proposition [T] to claim that the randomly modulated code with 
prototype {/n, 4>n) achieves the same error probability (hence error exponent) and distortion as the prototype. 

Appendix IV 
Proof of Proposition!!] 

By ([181), Jl{ps,Pxu\S,Py\x) is reduced to 

Mps,Pxu\s,Py\x) - I{U;X) - /([/; S). 
Coosing U = X yields the lower bound 

C^**=f(i:)i,0) > max I{X;X)-I{X;S) 

Pxis£Qf""iPs.Di) 

= max H{X\S). (54) 

Pxis&Qf""{ps,Di) 

On the other hand, 

Mps,Pxu\s,Py\x) = I{U-X)~I(U;S) 

< I{U;X\S) (55) 
= H{X\S) - H{X\U,S) 

< H{X\S). (56) 
Note that (l55T l follows from the chain rule of mutual information 

I{U; XS) = I{U; X) + I{U] S\X) = /([/; S) + I{U]X\S) 
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and I{U ; S\X) > 0. Choosing U = X achieves equality in both ( l55b and 
From ( |56] |. we obtain 

C^*'=f(Di,0) = lim max JLips,Pxu\s,PY\x) 

< lim max ^^(^15") 

= max H{X\S). (57) 

Px|sGQS'=9(ps,£'i) 

Combining ( |54| ) and ( ISTl i yields ( fT9b and proves the proposition. 

Appendix V 
Proof of Proposition|4] 

Since py\x = ^{y=x}, the term D{ps Pxu\s Py\xus\\ps Pxu\S Py\x) in (0 is infinite if Py\xus ^ Py|x- 
Hence, the minimizing Py\xus is given by 

Pyijcas = Py\x = l{y=x}- 
Consequently, the two terms of the cost function of ( fT4b are reduced to 

D{ps Pxu\s Py\xus\\ps Pxu\s Py\x) = ^(psIIps) 

and 

Jl{ps,Pxu\s,Py\xus) - R = \Jl{ps,Pxu\s,Py\x) - R\ , 



respectively. This yields 

E^'^^iR) = min 

ps eT's 



-D(pslbs) + lim max \JLips,Pxu\S,PY\x) - R\'^ 

L^oopx(7|s6S®' = 9(i,PS,-Di) 



(58) 



Similarly to the steps in the proof of Proposition [3j we derive that 



VL>2: ma^ \JLips,Pxu\S,PY\x)-R\+^ max 1^^,,^,, JX|5) - + . (59) 

The maximum on the left side is achieved hy U ^ X. Combining ( |58] l and ( |59] ) proves the proposition. 

Appendix VI 
Proof of Proposition[5] 

We prove Proposition |5] in two parts. We first establish that the right-hand side of ( |29] l is an upper bound on 
the public watermarking capacity (jPi^bWM ^ Then we prove that the right-hand side of ( |29l ) is at the same time a 
lower bound on the perfectly secure steganographic capacity C"^**^^. 

We start with the following lemma on the properties of pxu\S G Qfyc^{qL,ps, Di), which are used throughout 
this proof. 

Lemma 1: Any Pxu\S & Qcyc^i^^yPs, Di) generated by (l24l) from its corresponding pxv\S S Q{L,ps, Di) 
has the following properties: 
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(i) Ps\u{s\qv + i) = Ps\v( s ~ ^ l^)' Vi, s £ 5 and Vv £ V; 

(ii) Px\ui^\l^ + *) ~ Px\v( x — i \v), Vi, x e S and Vw £ V; 

f/i/j pu{qv + i) — ^py{v), V« £ iS, u £ V, where pu (resp. py) is the marginal PMF of U (resp. V) induced 
from pxu\S (resp. Pxv\s) and = U(5); and 

(iv) px — V{S), where px is the marginal PMF of X induced from Pxu\S and ps — U{S). 
It is straightforward to verify Lemma [ni)-(iv) from (|24] |. 

A. Upper Bound 

For the capacity of the public watermarking game, 

CP^bWM(^j^^j^^^ = lim max min Jl{ps,Pxv\s,Py\x) 

L^oopxv|s6S(i.PS.-Di) py|xe-4(px,-D2) 

< lim max min Jl(ps,Pxv\S,Py\x), (60) 



since Aye (£'2) C A{px,D2) by 

Given any pxv\S G Ql^iPSifi) and its associated pxu\s G Qcyc^il^TPs, Di), we first verify that 

J(5;C/)==/(5;y). (61) 

From Ps = U(5) and Pxv\S' we obtain 

^(^1^) = -Y.Py(''^T.Ps\vi^\^)^^&Ps\vis\v). (62) 

11=0 s=0 

From p5 = U(iS) and Pxu\S' we have 

L-l g-i 9-1 

7?(S'|f/) = - ^ ^^PuiQv + 'i)Ps\u{s\qv + i) ^ogps\u{s\qv + i) 

ti=0 i=0 s=0 

= -^^^-Pv{v)ps\v{^^ij\v)\ogps\v{§^il\v) (63) 

u=0 i=0 s=0 ^ 



(64) 



^ 1=0 

where ^ is obtained by using LemmaHi) and (iii). Since U) = ^^(5) - H{S\U) and /(S*; F) = H{S) - 
H{S\V), (EB follows from dH. 

For the pair (pxv|5,Py|x) e Q(L,p5, i:'i)xAyc(£'2) and its associated pair (pxt/ls^Pyix) e Q^^^3(^qL,ps,Di)x 
Acyc{D2), we have the following lemma that is proved in Appendix IVIII 

Lemma 2: 

iY;V)< 

^PS ,Pxu\s,Py\x 

{Y;U). (65) 

From (l6n i. Lemma |2] and the definition of Jl in (fTOb . we obtain 

-^L(ps,Pxy|s,Py|x) < JqL{ps,Pxu\s,PY\x), (66) 
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which yields 



lim max min JLips,Pxv\s,PY\x) 

L^cxD pxv| s G S(i. PS :-Di ) I X 6-4cHc (-D2) 

< lim max min JqL{Ps,Pxu\s,PY\x)- (67) 

^^°^PxuiseQf^'^HqL,ps,Di)PYixeA,ya{D2) 

Therefore, (|60] | and (l67T l yield 

< lim max min Jl{ps,Pxu\s,Py\x)- (68) 

-'^^'^ PxL/i s e sSc" (gi.ps ,£'1) Pi- 1^ G'4-!/<= (-02 ) 

B. Lower Bound 

Using the same argument at the end of Appendix HIl for the sequence {Cf Z52)}, we can argue that the 

sequence {Cf "''^^^(L'l, Da)} is also nondecreasing and bounded by log Therefore, {Cf "''^^^(Di, Da)} and 
any of its subsequences converge to the same limit. That is 

CP-bWM(^j^^^jj^^ = lim max min Jl{ps,Pxu\s,Py\x) 

L^oo Pxu\s^Q{L,Ps,Di) PY\xliA{PX,D2) 

= lim max min Jl{ps,Pxu\s,Py\x)- (69) 

L^oo pxu\s'^Q{qL,ps,Di) Py\x&A{px ,02) 



Similarly, 



C^'^^spi.Da) = lim max min Jl{ps,Pxu\s,Py\x) 

PxuiseQ'^*''^{L,ps,Di)pY\x&Mpx,D2} 

= lim max min Jl{ps,Pxu\s,Py\x)- (70) 

L^OOpxC/|s6S^' = 9(gi,PS,-Dl) PY\X&A{px,D2] 



From 



Qf;;ta{qL,ps,D,) C Q^*^f((zi,P5,i?i) C Q{qL,ps,D,). 

Thus, we have 

> lim max min Jl{ps,Pxu\s,Py\x)- (71) 

^^°^PxuiseQfi^a''{qL.ps,Di)PY\xeA{px.D2) 

Given Py\x G -^{px, D2), we define q conditional PMFs: 

Pv\Y(y\x) ^ PY\x{y_rn\ x - m ), Vx, y G 5, < m < 9. (72) 

Since the distortion matrix {d(i, j)}^ ^Lo cyclic, it is easy to verify that all the q conditional PMFs Py\x ^ 
Aipx,D2). 
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The conditional PMF py^jj induced by (pxu\s^Py\x^ ^ Qfyc'^{<lL,ps, -Di) x A{px,D2) is given by 

g-l 

PYluiyll"" + '^) = ^Px\u{x\qv + i)pY\xiy\x) 

9-1 

= y^,Px\u{x\qv + i)PY\x{ y - - fn ) (73) 

a;=0 
9-1 

= Px\v{ x - i \v)pY\x{y_rn\ x - m ) (74) 



x=0 
9-1 



= y^Px|[/( a; - m \qv + » - m )pY\x{y_jn\ x - m ) (75) 

a;=0 

= PY\u{ y - m \qv + i^^rn), Vy, i G 5, G V, (76) 

where (|73] | follows from the definition (|72] |. and both (|74| | and (iTSl l follow by applying Lemmalllii)- We also obtain 
the marginal PMF of Y as 

L-l 9-1 

p7{y) = J2Y.Pu(.<i^ + ^)PY\uiy\<iy + ^) 

v=Q 1=0 
L-l 9-1 

= y^Pc/(g'^ + » - Tn )pY\u{y_2m\qv + i -m ) (77) 

= PYiy^m), yyeS, (78) 

where (ITTT i follows from Lemma [Tfiii) and ( l76l l. 
From ( |76] l and ( |78] l, we obtain 

and hence 

Jl{ps,Pxu\s,Py\x) = Jl{ps,Pxu\s,Py\x)^ (80) 

for < TO < g. 

A 1 Y^9-l 



Let py|x = i J21i=oPy\x- ^^^y ^° check that py|x G Ayc(£'2)- Also, 

1 

-^Lbs,PX(7|S,PY|x) = - X! Jl{PS,PXU\S,Py\x) 

m=0 

/ 1 \ 

> Jl Ps,Pxu\Si - X! Py\x = Jl{ps,Pxu\s,Py\x), 



'?,„=o 



(81) 



(82) 



where the inequahty comes from the fact that for fixed ps and pxu\S^ Jl{ps,Pxu\SiPy\x) is convex in py\x [27, 
Proposition 4.1(iii)]. Therefore, from (l82b we have 



> lim max min -^L(ps,Pjfc/|s,Py|x) 

^^^PxuiseQayt''{qL.,ps,Di)PY\xeA{px,D2) 

> lim max min Jl{ps,Pxu\S,Py\x)- (83) 

^^°^Pxc/ls6S?yS''(9i,PS,£'l)P>'lxe^e„c(D2) 
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Combining the upper bound inequality in ( |68] | and the lower bound inequality in (l83l l, we prove the claim 

= lim max min Jl{ps,Pxu\S,Py\x), (84) 

which means that the perfectly secure steganographic constraint does not cause any capacity loss. 

Appendix VII 
Proof of Lemma |2] 

For the pair {pxv\SiPy\x) £ Q{L,ps, Di) x Acyc{D2), the conditional PMF of Y given V is 

PY\v{y\v) = ^Px\v{x\v)pY\x{y\x) 

x=0 
9-1 

= y".Px\v(.x\v)pY\x{ y - x \0), VyG5, weV, (85) 

x=0 

where dSSl l follows from dZTl l in Definition [TT] for J3y|x £ Acyc{D2)- The conditional entropy of F given V is 

L-l g-1 

^(^1^) = -J2py^''^J2py\viy\^)^o&PY\v{y\v). (86) 

For the associated pair {pxu\StPy\x) G Qcyc^ (i^jPs^ Di) x .Acyc(-D2), the conditional PMF of F given U is 

2;=0 

) (87) 



= X! ^^1^ (^ziil^) py\x [ y-i- (x- i) 



x=0 



= PY\v{y-i\v), Vy, 16 5,1; eV, (88) 

where to obtain ( [87] ) we have used Lemma [TJii) and ( |27| i in Definition [TT] for Py\x ^ Acyc{D2)', and ( [88] l follows 
from ([85] l. The marginal PMF of F is given by 

L-l q-l 

PY{y) = + + 

L~lq-l ^ 

= ^^-Pvi'")PY\v{y^\v) 



(89) 

= (90) 



t)=0 i=0 
9-1 
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where (|89T l follows from Lemma [Tfiii) and dSSl l. The conditional entropy of Y given U is 

L-lg-l g-1 

ff(y|?7) = -^^pu{qv + i)^PY\u{y\qv + i) ^ogPY\u{y\qv + i) 

= - -pv{v)y^^PY\v( y - ^ |-") ^ogPYwi y - » k) (91) 

ij=0 2=0 y=0 

1 

= - Vi7(r|v^) = i/(r|F), (92) 

q ^-^ 

where ( [9T| i follows from Lemma [Tfiii) and dSSl l, and ( |92l ) follows from ( [86b . 
Since pY{y) = ^ for any ?/ G 5 as shown in ( |90l l, we have 

> -^^Pvin, (93) 

where andpy are the marginal PMF of F for {ps,Pxu\SiPy\x) and (ps,Pxy|S:PF|x)> respectively. Therefore, 
from ( |92] l and ( |93] |. we obtain 

= Hp^{Y)-H(Y\U) (94) 

> i/p,(y)-ff(y|F) (95) 

= I{Y;V). (96) 

Hence, Lemma |2] is proved. 
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